<!--------------------------------------------------------------------------->  
<!--                           INTRODUCTION                                

 The Code Project article submission template (HTML version)

Using this template will help us post your article sooner. To use, just 
follow the 3 easy steps below:
 
     1. Fill in the article description details
     2. Add links to your images and downloads
     3. Include the main article text

That's all there is to it! All formatting will be done by our submission
scripts and style sheets. 

-->  
<!--------------------------------------------------------------------------->  
<!--                        IGNORE THIS SECTION                            -->
<html>
<head>
<title>The Code Project</title>
<Style>
BODY, P, TD { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10pt }
H2,H3,H4,H5 { color: #ff9900; font-weight: bold; }
H2 { font-size: 13pt; }
H3 { font-size: 12pt; }
H4 { font-size: 10pt; color: black; }
PRE { BACKGROUND-COLOR: #FBEDBB; FONT-FAMILY: "Courier New", Courier, mono; WHITE-SPACE: pre; }
CODE { COLOR: #990000; FONT-FAMILY: "Courier New", Courier, mono; }
</style>
<link rel="stylesheet" type=text/css href="http://www.codeproject.com/styles/global.css">
</head>
<body bgcolor="#FFFFFF">
<!--------------------------------------------------------------------------->  


<!-------------------------------     STEP 1      --------------------------->
<!--  Fill in the details (CodeProject will reformat this section for you) -->

<pre>Title:  A Naive Bayesian Classifier in C#
Author:      Erich Guenther 
Email:       erich_guenther@hotmail.com
Environment: Microsoft Visual C# 2005
Keywords:    Algorithm, Bayes, Classification
Level:       Intermediate&quot;
Description: An article on Naive Bayesian Classifier in C#
Section      Miscellaneous
SubSection   General
</pre>

<!-------------------------------     STEP 2      --------------------------->
<!--  Include download and sample image information.                       --> 

<ul class=download>
<li ><a href="BayesClassifier/BayesClassifier.zip">Download source + demo project - 20 Kb </a></li>
</ul>

<p><img src="BayesClassifier/pict.gif" alt="Sample Image" ></p>


<!-------------------------------     STEP 3      --------------------------->
<!--  Add the article text. Please use simple formatting (<h2>, <p> etc)   --> 

<h2>Introduction</h2>

<p>
	I was looking for a way to classify short texts into several categories. A simple
	but probably sufficient method seemed to be naive bayesian classification. Looking
	for some readily available stuff I found many different implementations in perl
	or java. The only CLR implementation I could find was NClassifier, yet it was not
	doing classification into multiple classes. Therefore I decided to write my own.
</p>
<h2>Background</h2>

<p>
	There is plenty of information around in the net, describing the theory of bayesian
	classification. <a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier">Wikipedia</a> has a good introduction.</p>
<h2>Using the code</h2>

<p>
	First create an instance of&nbsp; <code>BayesClassifier.Classifier</code>.
</p><pre>BayesClassifier.Classifier m_Classifier = new BayesClassifier.Classifier();</pre><p>
		Tip: you may experiment with <code>BayesClassifier.ExcludedWords</code> to define the words
		that you will consider irrelevant for your classification. That can lead to smaller dictionaries
		and therefore speed up the classification.</p>
<p>
	Then define the categories and teach each cateory:

</p><pre>m_Classifier.TeachCategory("Cat1", new System.IO.StreamReader(file));
m_Classifier.TeachPhrases("Cat2", new string[] { "Hi", "HoHo" });
</pre><p>
	Finally the method <code>BayesClassifier.Classifier.Classify</code> will return the classification result.
</p><pre>Dictionary&lt;string, double&gt; score = m_Classifier.Classify(new System.IO.StreamReader(file));
</pre>
<p>Let me know if you have any questions or suggestions, and let me know if you have any experiences with the applicability of the navie bayesian approach.(Since the
	(wrong) assumption of word independence might turn out to be influence the result).</p>


<h2>History</h2>
<ul>
<li>28.May.2006 Version 1.0</li>
</ul>
<!-------------------------------    That's it!   --------------------------->
</body>
</html>
